فهرست مطالب

Journal of Artificial Intelligence and Data Mining
Volume:11 Issue: 2, Spring 2023

  • تاریخ انتشار: 1402/04/26
  • تعداد عناوین: 12
|
|
  • MohammadReza Keyvanpour *, Zahra Karimi Zandian, Nasrin Mottaghi Pages 161-186

    Regression testing reduction is an essential phase in software testing. In this step, the redundant and unnecessary cases are eliminated, whereas software accuracy and performance are not degraded. So far, various researches have been proposed in regression testing reduction field. The main challenge in this area is to provide a method that maintain fault-detection capability while reducing test suites. In this paper, a new test suite reduction technique is proposed based on data mining. In this method, in addition to test suite reduction, its fault-detection capability is preserved using both clustering and classification. In this approach, regression test cases are reduced using a bi-criteria data mining-based method in two levels. In each level, the different and useful coverage criteria and clustering algorithms are used to establish a better compromise between test suite size and the ability of reduced test suite fault detection. The results of the proposed method have been compared to the effects of five other methods based on PSTR and PFDL. The experiments show the efficiency of the proposed method in the test suite reduction in maintaining its capability in fault detection.

    Keywords: Test suite reduction, Software, data mining, Coverage criteria, Clustering
  • Zahra Asghari Varzaneh, Soodeh Hosseini * Pages 187-194

    This paper proposed a fuzzy expert system for diagnosing diabetes. In the proposed method, at first, the fuzzy rules are generated based on the Pima Indians Diabetes Database (PIDD) and then the fuzzy membership functions are tuned using the Harris Hawks optimization (HHO). The experimental data set, PIDD with the age group from 25-30 is initially processed and the crisp values are converted into fuzzy values in the stage of fuzzification. The improved fuzzy expert system increases the classification accuracy which outperforms several famous methods for diabetes disease diagnosis. The HHO algorithm is applied to tune fuzzy membership functions to determine the best range for fuzzy membership functions and increase the accuracy of fuzzy rule classification. The experimental results in terms of accuracy, sensitivity, and specificity prove that the proposed expert system has a higher ability than other data mining models in diagnosing diabetes.

    Keywords: Fuzzy Expert System, Harris Hawks Optimization, Membership functions, Diabetes
  • A.M. Latif *, Z. Mehrnahad, J. Zarepour Pages 195-211

    In this paper, a novel scheme for lossless meaningful visual secret sharing using XOR properties is presented. In the first step, genetic algorithm with an appropriate proposed objective function created noisy share images. These images do not contain any information about the input secret image and the secret image is fully recovered by stacking them together. Because of attacks on image transmission, a new approach for construction of meaningful shares by the properties of XOR is proposed. In recovery scheme, the input secret image is fully recovered by an efficient XOR operation. The proposed method is evaluated using PSNR, MSE and BCR criteria. The experimental results presents good outcome compared with other methods in both quality of share images and recovered image.

    Keywords: Secret sharing, Visual secret sharing, Meaningful secret sharing, Genetic Algorithm
  • Foad Ghaderi *, Amin Rahmati Pages 213-220

    Every facial expression involves one or more facial action units appearing on the face. Therefore, action unit recognition is commonly used to enhance facial expression detection performance. It is important to identify subtle changes in face when particular action units occur. In this paper, we propose an architecture that employs local features extracted from specific regions of face while using global features taken from the whole face. To this end, we combine the SPPNet and FPN modules to architect an end-to-end network for facial action unit recognition. First, different predefined regions of face are detected. Next, the SPPNet module captures deformations in the detected regions. The SPPNet module focuses on each region separately and can not take into account possible changes in the other areas of the face. In parallel, the FPN module finds global features related to each of the facial regions. By combining the two modules, the proposed architecture is able to capture both local and global facial features and enhance the performance of action unit recognition task. Experimental results on DISFA dataset demonstrate the effectiveness of our method.

    Keywords: Facial action recognition, facial action units, deep learning
  • Mohammad Nazari, Hossein Rahmani *, Dadfar Momeni, Motahare Nasiri Pages 221-228

    Graph representation of data can better define relationships among data components and thus provide better and richer analysis. So far, movies have been represented in graphs many times using different features for clustering, genre prediction, and even for use in recommender systems. In constructing movie graphs, little attention has been paid to their textual features such as subtitles, while they contain the entire content of the movie and there is a lot of hidden information in them. So, in this paper, we propose a method called MoGaL to construct movie graph using LDA on subtitles. In this method, each node is a movie and each edge represents the novel relationship discovered by MoGaL among two associated movies. First, we extracted the important topics of the movies using LDA on their subtitles. Then, we visualized the relationship between the movies in a graph, using the cosine similarity. Finally, we evaluated the proposed method with respect to measures genre homophily and genre entropy. MoGaL succeeded to outperforms the baseline method significantly in these measures. Accordingly, our empirical results indicate that movie subtitles could be considered a rich source of informative information for various movie analysis tasks.

    Keywords: Subtitle analysis, Movies graph, Graph analysis, Graph entropy, Graph homophily
  • Kourosh Kiani *, Fatemeh Alinezhad, Razieh Rastgoo Pages 229-236

    Gender recognition is an attractive research area in recent years. To make a user-friendly application for gender recognition, having an accurate, fast, and lightweight model applicable in a mobile device is necessary. Although successful results have been obtained using the Convolutional Neural Network (CNN), this model needs high computational resources that are not appropriate for mobile and embedded applications. To overcome this challenge and considering the recent advances in Deep Learning, in this paper, we propose a deep learning-based model for gender recognition in mobile devices using the lightweight CNN models. In this way, a pretrained CNN model, entitled Multi-Task Convolutional Neural Network (MTCNN), is used for face detection. Furthermore, the MobileFaceNet model is modified and trained using the Margin Distillation cost function. To boost the model performance, the Dense Block and Depthwise separable convolutions are used in the model. Results on six datasets confirm that the proposed model outperforms the MobileFaceNet model on six datasets with the relative accuracy improvements of 0.02%, 1.39%, 2.18%, 1.34%, 7.51%, 7.93% on the LFW, CPLFW, CFP-FP, VGG2-FP, UTKFace, and own data, respectively. In addition, we collected a dataset, including a total of 100’000 face images from both male and female in different age categories. Images of the women are with and without headgear.

    Keywords: deep learning, Gender recognition, Margin distillation, Dense block, MobileFaceNet
  • Nosratali Ashrafi-Payaman *, Maryam Khazaei Pages 237-245

    Nowadays, whereas the use of social networks and computer networks is increasing, the amount of associated complex data with graph structure and their applications, such as classification, clustering, link prediction, and recommender systems, has risen significantly. Because of security problems and societal concerns, anomaly detection is becoming a vital problem in most fields. Applications that use a heterogeneous graph, are confronted with many issues, such as different kinds of neighbors, different feature types, and differences in type and number of links. So, in this research, we employ the HetGNN model with some changes in loss functions and parameters for heterogeneous graph embedding to capture the whole graph features (structure and content) for anomaly detection, then pass it to a VAE to discover anomalous nodes based on reconstruction error. Our experiments on AMiner data set with many base-lines illustrate that our model outperforms state-of-the-arts methods in heterogeneous graphs while considering all types of attributes.

    Keywords: Graph Mining, Graph-based Anomaly Detection, Graph Embedding, Heterogeneous Graph, Graph Neural Network
  • H. Aghabarar, K. Kiani *, P. Keshavarzi Pages 247-257

    Nowadays, given the rapid progress in pattern recognition, new ideas such as theoretical mathematics can be exploited to improve the efficiency of these tasks. In this paper, the Discrete Wavelet Transform (DWT) is used as a mathematical framework to demonstrate handwritten digit recognition in spiking neural networks (SNNs). The motivation behind this method is that the wavelet transform can divide the spike information and noise into separate frequency subbands and also store the time information. The simulation results show that DWT is an effective and worthy choice and brings the network to an efficiency comparable to previous networks in the spiking field. Initially, DWT is applied to MNIST images in the network input. Subsequently, a type of time encoding called constant-current-Leaky Integrate and Fire (LIF) encoding is applied to the transformed data. Following this, the encoded images are input to the multilayer convolutional spiking network. In this architecture, various wavelets have been investigated, and the highest classification accuracy of 99.25% is achieved.

    Keywords: Wavelet Transform, digit recognition, convolutional SNN, constant-current-LIF encoding
  • Behrooz Shahrokhzadeh *, MohammadHossein Shayesteh, Behrooz Masoumi Pages 259-289

    This paper provides a comprehensive review of the potential of game theory as a solution for sensor-based human activity recognition (HAR) challenges. Game theory is a mathematical framework that models interactions between multiple entities in various fields, including economics, political science, and computer science. In recent years, game theory has been increasingly applied to machine learning challenges, including HAR, as a potential solution to improve recognition performance and efficiency of recognition algorithms. The review covers the shared challenges between HAR and machine learning, compares previous work on traditional approaches to HAR, and discusses the potential advantages of using game theory. It discusses different game theory approaches, including non-cooperative and cooperative games, and provides insights into how they can improve the HAR systems. The authors propose new game theory-based approaches and evaluate their effectiveness compared to traditional approaches. Overall, this review paper contributes to expanding the scope of research in HAR by introducing game-theoretic concepts and solutions to the field and provides valuable insights for researchers interested in applying game-theoretic approaches to HAR.

    Keywords: Machine learning, deep learning, Challenges, Solutions, Opportunities
  • A.R. Mazochi, S. Bourbour, M. R. Ghofrani, S. Momtazi * Pages 291-302

    Converting a postal address to a coordinate, geocoding, is a helpful tool in many applications. Developing a geocoder tool is a difficult task if this tool relates to a developing country that does not follow a standard addressing format. The lack of complete reference data and non-persistency of names are the main challenges besides the common natural language process challenges. In this paper, we propose a geocoder for Persian addresses. To the best of our knowledge, our system, TehranGeocode, is the first geocoder for this language. Considering the non-standard structure of Persian addresses, we need to split the address into small segments, find each segment in the reference dataset, and connect them to find the target of the address. We develop our system based on address parsing and dynamic programming for this aim. We specify the contribution of our work compared to similar studies. We discuss the main components of the program, its data, and its results and show that the proposed framework achieves promising results in the field by finding 83\% of addresses with less than 300 meters error.

    Keywords: Geocoding, Persian Address, Natural Language Processing, Address Parsing, Viterbi
  • Ali Ghorbanian, Hamideh Razavi * Pages 303-314

    In time series clustering, features are typically extracted from the time series data and used for clustering instead of directly clustering the data. However, using the same set of features for all data sets may not be effective. To overcome this limitation, this study proposes a five-step algorithm that extracts a complete set of features for each data set, including both direct and indirect features. The algorithm then selects essential features for clustering using a genetic algorithm and internal clustering criteria. The final clustering is performed using a hierarchical clustering algorithm and the selected features. Results from applying the algorithm to 81 data sets indicate an average Rand index of 72.16%, with 38 of the 78 extracted features, on average, being selected for clustering. Statistical tests comparing this algorithm to four others in the literature confirm its effectiveness.

    Keywords: time series, Clustering, Feature extraction, Feature Selection, data mining
  • Maryam Khodabakhsh *, Fahimeh Hafezi Pages 315-329

    Coronavirus disease as a persistent epidemic of acute respiratory syndrome posed a challenge to global healthcare systems. Many people have been forced to stay in their homes due to unprecedented quarantine practices around the world. Since most people used social media during the Coronavirus epidemic, analyzing the user-generated social content can provide new insights and be a clue to track changes and their occurrence over time. An active area in this space is the prediction of new infected cases from Coronavirus-generated social content. Identifying the social content that relates to Coronavirus is a challenging task because a significant number of posts contain Coronavirus-related content but do not include hashtags or Corona-related words. Conversely, posts that have the hashtag or the word Corona but are not really related to the meaning of Coronavirus and are mostly promotional. In this paper, we propose a semantic approach based on word embedding techniques to model Corona and then introduce a new feature namely semantic similarity to measure the similarity of a given post to Corona in semantic space. Furthermore, we propose two other features namely fear emotion and hope feeling to identify the Coronavirus-related posts. These features are used as statistical indicators in a regression model to estimate the new infected cases. We evaluate our features on the Persian dataset of Instagram posts, which was collected in the first wave of Coronavirus, and demonstrate that the consideration of the proposed features will lead to improved performance of the Coronavirus incidence rate estimation.

    Keywords: Social Media, Semantic similarity, Fear emotion, Hope feeling, Corona Incidence rate estimation